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V.A. Traag 1 ’ 2 ’* 

1 Royal Netherlands Institute of Southeast Asian and Caribbean Studies 

2 e- Humanities Group, Royal Netherlands Academy of Arts and Sciences 

Many complex networks exhibit a modular structure of densely connected groups of nodes. Usu¬ 
ally, such a modular structure is uncovered by the optimization of some quality function. Although 
flawed, modularity remains one of the most popular quality functions. The Louvain algorithm was 
originally developed for optimizing modularity, but has been applied to a variety of methods. As 
such, speeding up the Louvain algorithm, enables the analysis of larger graphs in a shorter time 
for various methods. We here suggest to consider moving nodes to a random neighbor community, 
instead of the best neighbor community. Although incredibly simple, it reduces the theoretical 
runtime complexity from (D(m) to 0(nlog(k)) in networks with a clear community structure. In 
benchmark networks, it speeds up the algorithm roughly 2-3 times, while in some real networks 
it even reaches 10 times faster runtimes. This improvement is due to two factors: (1) a random 
neighbor is likely to be in a “good” community; and (2) random neighbors are likely to be hubs, 
helping the convergence. Finally, the performance gain only slightly diminishes the quality, espe¬ 
cially for modularity, thus providing a good quality-performance ratio. However, these gains are less 
pronounced, or even disappear, for some other measures such as significance or surprise. 


I. INTRODUCTION 

Complex networks have gained attention the past 
decade [1]. Especially with the rise of social media, social 
networks of unprecedented size became available, which 
contributed to the establishment of the computational 
social sciences [2, 3]. But networks are also common in 
disciplines such as biology [4] and neurology [5]. Many 
of these networks share various common characteristics. 
They often have skewed degree distributions [6], show a 
high clustering and a low average path length [7]. Nodes 
often cluster together in dense groups, usually called 
communities. Nodes in a community often share other 
characteristics: metabolites show related functions [ 8 ] 
and people have a similar background [9]. Revealing the 
community structure can thus help to understand the 
network [10]. 

Modularity [11] remains one of the most popular mea¬ 
sures in community detection, even though it is flawed. 
There have been many algorithms suggested for optimiz¬ 
ing modularity. The original algorithm [11] created a full 
dendrogram and used modularity to decide on a cutting 
point. It was quite slow, running in 0(n 2 m), where n is 
the number of nodes and m the number of links. Many 
algorithms were quickly introduced to optimize modular¬ 
ity, such as extremal optimization [12], simulated anneal¬ 
ing [13, 14], spectral methods [15], greedy methods [16], 
and many other methods [10]. One of the fastest and 
most effective algorithms is the Louvain algorithm [17], 
believed to be running in 0(m). It has been shown to 
perform very well in comparative benchmark tests [18]. 
The algorithm is largely independent of the objective 
function to optimize, and as such has been used for dif¬ 
ferent methods [19-24] 
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We first briefly describe the algorithm, and introduce 
the terminology. We then describe our simple improve¬ 
ment, which we call the random neighbor Louvain, and 
argue why we expect it to function well. We derive es¬ 
timates of the runtime complexity, and obtain 0(m) for 
the original Louvain algorithm, in line with earlier re¬ 
sults, and 0(nlog(k)) for our improvement, where ( k } 
is the average degree. This makes it one of the fastest 
algorithms for community detection to optimize an ob¬ 
jective function. Whereas the original algorithm runs in 
linear time with respect to the number of edges, the ran¬ 
dom neighbor algorithm is nearly linear with respect to 
the number of nodes. Finally, we show on benchmark 
tests and some real networks that this minor adjustment 
indeed leads to reductions in running time, without los¬ 
ing much quality. These gains are especially visible for 
modularity, but less clear for other measures such as sig¬ 
nificance and surprise. 


II. LOUVAIN ALGORITHM 

Community detection tries to find a “good” partition 
for a certain graph. In other words, the input is some 
graph G = ( V,E ) with n = \V\ nodes and m = \E\ 
edges. Each node has ki neighbors, which is called the 
degree, which on average is ( k ) =t —. The output is 
some partition V = {Vi, V 2 ,..., V r }, where each V c C V 
is a set of nodes we call a community. We work with non¬ 
overlapping nodes, such that V c D Vd = 0 for all c ^ d 
and all nodes will have to be in a community, so that 
[J V c = V. Alternatively, we denote by Oi the community 
of node i, such that Ui = c if (and only if) i £ V c . Both 
a and V may be used interchangeably to refer to the 
partition. If the distinction is essential, we will explicitly 
state this. 

The Louvain algorithm is suited for optimizing a single 
objective function that specifies some quality of a par- 
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tition. We denote such an objective function with "H, 
which should be maximized. We use H(a) and TL(V) 
to mean the same thing. There are various choices for 
such an objective function, such as modularity [11], Potts 
models [ 13 , 19 , 22], significance [25], surprise [26], in- 
fomap [ 21 ] and many more. We will not specify any of 
the objective functions here, nor shall we discuss their 
(dis)advantages, as we focus on the Louvain algorithm 
as a general optimization scheme. 

Briefly, the Louvain algorithm works as follows. The 
algorithm initially starts out with a partition where each 
node is in its own community (i.e. at = *), which is 
the initial partition. So, initially, there are as many 
communities as there are nodes. The algorithm moves 
around nodes from one community to another, to try to 
improve 'H(a). We denote by A7f (cr., : i-a c) the difference 
in moving node i to another community c. In particu¬ 
lar, A H(<Ti c) = H(cr 1 2 ) — 'H(cr) where ex' = oj for all 
j / * and ex' = c, implying that if A"H(cxj i—>• c) > 0, the 
objective function 7~L is improved. At some point, the 
algorithm can no longer improve 'H by moving around 
individual nodes, at which point it aggregates the graph, 
and reiterates on the aggregated graph. We repeat this 
procedure as long as we can improve "H(cx). The outline 
of the algorithm is displayed in Algorithm 1. 

There are two key procedures: MoveNodes and Ag¬ 
gregate. The MoveNodes procedure displayed in Al¬ 
gorithm 1 loops over all nodes (in random order), and 
considers moving them to an alternative community. 
This procedure relies on SelectCommunity to select a 
(possibly) better community c. Only if the improvement 
A%(cr„ K > c) > 0, we will actually move the node to 
community c. The Aggregate procedure may depend 
on the exact quality function H used. In particular, the 
aggregate graph G' should be constructed according to 
(7, such that T-L(G',cr') = T-L^G.a), where ex' = i is the 
initial partition. That is, the quality of the initial par¬ 
tition ex' of the aggregated graph G' should be equal to 
the quality of the partition a of the original graph G. 
In Algorithm 1 a version is displayed which is suited for 
modularity. Other methods may require additional vari¬ 
ables to be used when aggregating the graph (e.g. [ 19 ]). 

The only procedure that remains to be specified is Se¬ 
lectCommunity. In the original Louvain algorithm, 
this procedure commonly considers all possible neigh¬ 
boring communities, and then greedily selects the best 
community. It is summarized in Algorithm 2. 

We created a new flexible and fast implementation of 
the Louvain algorithm in C++ for use in python using 
igraph. The implementation of the algorithm itself is 
quite detached from the objective function to optimize. 
In particular, all that is required to implement a new ob¬ 
jective function is the difference when moving a node AH 
and the quality function 7~L itself (although the latter is 
not strictly necessary). This implementation is available 


function LouvAlN(Graph G ) 

tji 4 — i. > Initial partition 

ex' <— MoveNodes(G) > Initial move nodes 

while 'H(o') > H(o) do 
a •+- cr' 

G <— Aggregate(G, ex) 

E «— MoveNodes (G) > Move nodes 

cr' <— E a / for all i > Correct o' according to E 

end while 
return o' 
end function 

function MovENoDEs(Graph G) 

Oi 4 — i for i = 1,..., |V(G)|. > Initial partition 

q < -oo 

while T-L{o) > q do 
q = U{o) 

for random v £ V(G) do 

c«— SelectCommunity( n) 
if A 7-L{o v i —x c) > 0 then 
o v c. 

end if 
end for 
end while 
end function 

function AGGREGATE(Graph G, Partition ex) 

A ■£- Adjacency(G) 

Kd t- E ij Aij5{oi, c)S{oj,d) 

return A! 
end function 

ALGORITHM 1. Louvain method. The algorithm loops over 
all nodes and moves nodes to alternative communities. When 
no more improvement can be made, it aggregates the graph 
and reiterates the procedure. 

open source from GitHub and PyPi . 


III. IMPROVEMENT 

Not surprisingly, the Louvain algorithm generally 
spends most of its time contemplating alternative com¬ 
munities. While profiling our implementation, we found 
that it spends roughly 95% of the time calculating the 
difference A7i(a v c) in Algorithm 2 . Much of this 
time is spent moving around nodes for the first time. 
With an initial partition where each node is in its own 
community, almost any neighboring community would 
be an improvement. Moreover, when the algorithm has 
progressed a bit, many neighbors likely belong to the 
same community. We therefore suggest that instead of 
considering all neighboring communities, we simply se¬ 
lect a random neighbor, and consider that community 
(as stated in Algorithm 2), which we call the random 


1 https://github.com/vtraag/louvain-igraph 

2 https : //pypi .python, org/pypi/louvain 
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FIG. 1. Clique. The original Louvain algorithm considers all communities, which leads to E(t ) = 0(n (!) operations for 
putting all 7i c nodes of a clique in a single community. The improvement considers only random neighbors, which takes only 
E(t) = 0(n c logn c ) operations to identify the whole clique as a community. In (a) we show the number of operations in 
a simulation, with the markers indicating the simulated number of operations, and the solid lines the analytically derived 
estimates. In (b)-(e) we show the actual time used when optimizing the indicated quality functions for a clique for the different 
objective functions. The solid lines in (b)-(e) denote best fits to and n c logn c in log-space. 


Select best neighbor community 
function SELECTCoMMUNlTY(Node v) 

5 i -oo. 

C 4 <7 v . 

C ■£- {a u | (uv) £ E(G)} O Neighbor communities, 
for Community c! £ C do 
if AH(a v h->- c') > 5 then 
<5 ■£- A T-L(a v i->- c') 
c ■£- d 
end if 
end for 
return c 
end function 

Select random neighbor community 
function SELECTCoMMUNlTY(Node v) 
return random a £ {*« | (uv) £ E(G)}. 

end function 

ALGORITHM 2. Select the best or a random neighbor com¬ 
munity. 


neighbor Louvain. Notice that the selection of a ran¬ 
dom neighbor makes the greedy Louvain algorithm less 
greedy and thus more explorative. Indeed, when also ac¬ 
cepting moves with some probability depending on the 
improvement (possibly also accepting degrading moves), 
the algorithm comes close to resemble simulated anneal¬ 


ing [13, 14]. However, simulated annealing is rather slow 
for community detection [18], so we don’t explore that 
direction further, since we are interested in speeding up 
the algorithm. 

There are several advantages to the selection of a ran¬ 
dom neighbor. First of all, it is likely to choose a rela¬ 
tively “good” community. In general, a node should be 
in a community to which relatively many of its neigh¬ 
bors belong as well (although this of course depends on 
the exact quality function). By selecting a community 
from among its neighbors, there is a good chance that 
a relatively good community is picked. In particular, if 
node i has ki(c ) neighbors in community c, the proba¬ 
bility that community c will be considered for moving 
is ki(c)/ki. The probability for selecting a community 
is thus proportional to the number of neighbors in that 
community. Bad communities (with relatively few neigh¬ 
bors) are less frequently sampled, so that the algorithm 
focuses more on the promising communities (those with 
relatively many neighbors). 

Moreover, when considering the initial partition of each 
node in its own community, almost any move would im¬ 
prove the quality function T~L. The difference between 
alternative communities in this early stage is likely to 
be marginal. Any move that puts two nodes in the same 
community is probably better than a node in its own com¬ 
munity. Such moves quickly reduce the number of com- 
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FIG. 2. Network size. We here show the ratio of the time 
and of the quality (i.e. T~L) of the uncovered partitions by the 
original Louvain algorithm and the random neighbor Louvain. 
The random neighbor Louvain algorithm is 2-3 times faster 
than the original Louvain algorithm and at some points event 
faster for clear communities in (a) when using fi = 0.1. How¬ 
ever, for less clear communities at /it = 0.8 as displayed in 
(b), the optimization of significance and surprise is not faster 
by using the random neighbor Louvain. The random neigh¬ 
bor Louvain uncovers almost the same quality as the original 
version for a large part, as shown in (c) and (d). However, 
especially for surprise, the quality is adversely affected by the 
random neighbor Louvain for fi = 0.1, shown in (c). The re¬ 
sults are based on benchmark graph with communities of size 
n c = 1 000 and an average degree of (k) = 15. 


munities from roughly n to n/2. But instead of consider¬ 
ing every neighboring community as in the original Lou¬ 
vain algorithm, which takes roughly 0((k)), our random 
neighbor Louvain algorithm only considers a single ran¬ 
dom neighbor, which takes constant time 0(1). So, for 
the first few iterations, Louvain runs in 0(n(k)) = 0(m), 
whereas selecting a random neighbor runs in 0{n). 

Notice there is a big difference between (1) selecting a 
random neighbor and then its community and (2) select¬ 
ing a random community from among the neighboring 
communities. The first method selects a community pro¬ 
portional to the number of neighbors that are in that 
community, while the second method selects a commu¬ 
nity uniformly from the set of neighboring communities. 
Consider for example a node that is connected to two 
communities, and has ki~ 1 neighbors in the first commu¬ 
nity and only 1 in the other community. When selecting 
a community of a random neighbor, the probability the 
good community is considered is 1 — while the prob- 
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FIG. 3. Effect of community size. Results in (a) show 
that the speedup ratio increases with the community size for 
fi = 0.1. Surprise and significance find smaller substructures 
within large communities as seen in (c). This is also when the 
improvement starts to deteriorate. The results for large com¬ 
munities in (a) and (c) are very similar to the situation when 
At = 0.8 in (b) and (d), which resembles a random graph more 
closely. The speedup ratio in (b) corresponds to this: the 
speedup is rather large for modularity, while it is much lower 
for surprise and significance. In (e) and (f) we show that 
heterogeneity in the community sizes nearly do not impact 
the speedup ratio. In that case we generate LFR. benchmark 
graphs with smallest community size n c = 10 and the maxi¬ 
mum community size varies from 2 to 10 times as large. We 
use n = 10 5 and (k) = 10 for both benchmarks. 


ability is only | when selecting a random community. 

Secondly, random selection of a neighbor increases the 
likelihood of quick convergence. The probability that 
node i is selected as a random neighbor is roughly ki/2m, 
resembling preferential attachment [27] in a certain sense. 
Hubs are thus more likely to be chosen as a candidate 
community. Since, hubs connect many vertices, there is 
a considerable probability that two nodes consider the 
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same hub. If these two (or more) nodes (and the hub) 
should in fact belong to the same community, chances 
are high both nodes and the hub quickly end up in the 
same community. 

As an illustration of this advantage, consider a hubs- 
and-spokes structure, with one central hub and only 
neighboring spokes that are connected to each other (and 
always to the hub). So, any spoke node i is connected 
to nodes i — 1 and i + 1 and to the central hub, node 
n. Consider for simplicity that the nodes are considered 
in order and that every move will be advantageous. The 
probability that the first node will move to community n 
is pi = |. For the second node, he will move to commu¬ 
nity n if he chooses node n immediately (which happens 
with probability |), or if he chooses node 1 , and node 
1 moved to community n, so that p 2 = | + Pi 3 • Simi¬ 
larly, for the other nodes pi = | + Pi- 1 | = J2]=i (|) J 
which goes to \ for n —► 00 . This is higher than when 
just considering a random neighbor community. In that 
case, the probability the first node will move to com¬ 
munity n is still 77 . But for the second node, if node 1 
moved to community n , only two communities are left: 
n and 3. In that case, community n is chosen with prob¬ 
ability |. If node 1 didn’t move to community n, then 
node 2 will move to community n with probability |. In 
general, node i moves to community n with probability 
Pi = p^ 15 + (1 - Pi- 1)5 = Pi- tg + 5 . Working out 
the recurrence, we obtain that pi = | Xq=o (|)“\ which 
tends to |. Selecting a community of a random neighbor 
thus works better than selecting a random community 
from among the neighbors. Selecting a community of a 
random node is even worse. In that case, the probability 
is pi = — (1+ -) i_1 which tends to 0 for n —t 00 . In short, 
selecting the community of a random neighbor is likely 
to choose a new community that will also be chosen by 
other nodes. 

In summary, selecting a random neighbor should work 
well because of two reasons. First, it tends to fo¬ 
cus on communities that are “good”. Secondly, it 
should help in convergence because of higher likeli¬ 
hood of selecting hubs. In particular, the evaluation 
of SelectCommunity in the random neighbor Lou¬ 
vain takes a constant time 0(1) whereas evaluating all 
communities takes about 0((k)). However, one essential 
question is whether SelectCommunity will not be too 
frequently evaluated in the random neighbor Louvain to 
counter this benefit. 

To study this question, let us consider a ring of r 
cliques of n c nodes each. The cliques (which are com¬ 
plete subgraphs containing (^ c ) links) are connected to 
another clique only by a single link in a circular fash¬ 
ion (i.e. clique i is connected only to clique i — 1 and 
i + 1). Most methods tend to find the cliques (or sets 
of multiple cliques due to the resolution limit [19, 28]). 
Indeed, it is one of the best possible community struc¬ 
tures: we cannot add any more internal edges, nor can 
we delete any external edges without disconnecting the 


Network 

n 

m 

(k) 

Health 

2 539 

12 969 

10.22 

Brightkite 

58 228 

214078 

7.35 

Facebook 

63 731 

817035 

25.64 

Author Collaboration 

22 908 

2 673133 

233.38 

Web (Google) 

875 713 

5105 039 

11.66 

Web (Berk./Stan.) 

685 230 

7 600 595 

22.18 


TABLE I. Empirical network overview. 


graph. However, for the runtime complexity, the external 
edges will play only a marginal role. We may therefore 
simply assume we will work with r disconnected cliques 
of size n c . Although the actual runtime will deviate from 
this, it should provide a reasonable runtime for relatively 
“clear” communities, and as such provide a lower bound 
for more difficult communities. 

The core question is thus how quickly both the orig¬ 
inal and the random neighbor Louvain run on cliques. 
We will assume the clique should become a single com¬ 
munity, which is likely to be the case for most meth¬ 
ods. Additionally, we assume AH > 0 only if a node is 
moved to a larger community, which is likely to be the 
case for most methods as nodes in a clique have more 
links to larger communities. The complexity of the origi¬ 
nal Louvain implementation is simple to evaluate in this 
case. The first node will be moved to one of its neigh¬ 
bors, an operation that costs n c evaluations. The second 
node has only n c — 1 evaluations to make, since the com¬ 
munity of the first node disappeared. If we continue in 
this fashion, the total number of evaluations t is then 
EZi n c - * + 1 = ne(n 2 c+1) = 0(n 2 c ). The analysis of 
the expected runtime of the random neighbor Louvain is 
more difficult (see Appendix A for more details). How¬ 
ever, we can provide a lower bound that serves as a rough 
estimate. Let us again denote by t the total number of 
operations before the whole clique is identified as a single 
community. We divide this in different phases of the algo¬ 
rithm, where each phase i runs from the time where there 
are n c — i + 1 communities, until there are n c — i commu¬ 
nities. In phase 1 we thus start out with n c communities, 
and in the next phase there are only n c — 1 communities. 
If we denote by f, the number of operation in phase i, 
then by linearity E(t) = E('%2 t t i ) = J2 t E(ti). Notice 
that we will only leave phase i whenever a community of 
size 1 disappears. The probability that a community of 
1 disappears is , since it will join any other commu¬ 
nity (except itself). There are at most i communities of 
size 1 in phase i, so that the probability a community of 
size 1 is selected is bounded above by rac ~ ?+1 . In fact, 
such a state is also relatively likely, as the community size 
distribution tends to become more skewed than a more 
uniform distribution due to the preferential attachment 
on the basis of the community sizes. The number of ex¬ 
pected operations in phase i is then bounded below by 
n , and the expected operations in total is bounded 
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FIG. 4. Empirical network results. The random neighbor Louvain usually speeds up the optimization of the objective 
function for most empirical networks. For the hyperlink network from Google it does not work for any method, while the 
adolescent health dataset poses problems for optimizing CM modularity. The quality remains relatively similar compared to 
the original, especially for modularity as shown in (b). For significance and surprise the difference are more pronounced. 


below by 


E(t) > n e y^ 


n c — i + 1 


n c i 

= n c^2~ = 0(n c log 


n c 


(1) 

( 2 ) 


i—1 


However, this lower bound gives in fact a very accurate 
estimate of the expected running time, as seen in Fig. 1. 
Whereas the original Louvain algorithm runs in 0(n 2 ), 
the random neighbor version only uses 0(n c logn c ) to 
put all nodes of a clique in a single community. We used 
an explicit simulation of this process to validate our theo¬ 
retical analysis. Running the actual algorithms on cliques 
yields similar results (Fig. 1). 

To get a rough idea of the overall running time, let 
us translate these results back to the ring of cliques. In 
that case, we have r cliques of n c nodes. The runtime for 
the original Louvain method is 0(n 2 ) for each clique, so 
that the total runtime is about 0(rn 2 ). One factor of n 2 
comes from running over n c nodes, while the other factor 
comes from running over (k) ss n c neighbors. Since rn c = 
n, and n(k) = m, we thus obtain an overall running time 
of Louvain of about 0(rn 2 ) = 0(n(k}) = 0(m), similar 
to earlier estimates [10, 17]. Following the same idea, we 
obtain an estimate of roughly O(nlog(k)) for the runtime 
of the random neighbor Louvain algorithm. So, whereas 
the original algorithm runs in roughly linear time with 
respect to the number of edges, the random neighbor 


algorithm runs in nearly linear time with respect to the 
number of nodes. Empirical networks are usually rather 
sparse, so that the difference between (k) and log(fc) is 
usually not that large. Still, it is quite surprising to find 
such an improvement for such a minor adjustment. 


IV. EXPERIMENTAL RESULTS 

We use benchmark networks and real networks to show 
that the random neighbor improvement also reduces the 
runtime in practice. These benchmark networks contain 
a planted partition, which we then try to uncover us¬ 
ing both the original and the random neighbor Louvain 
algorithm. An essential role is played by the probabil¬ 
ity that a link falls outside of the planted community /i. 
For low /i it is thus quite easy to identify communities, 
while for high /i it becomes increasingly more difficult. 
We report results using the speedup ratio calculated as 
Rspeed = I!" 8 , where T rn is the runtime of the random 
neighbor variant and T or j g the runtime of the original 
Louvain method. The runtime is calculated in used CPU 
time, not elapsed real time. We also report the quality ra¬ 
tio, which is calculated as i? qua i = £ rn where 77 rn refers 
to the quality of the partition uncovered using the ran¬ 
dom neighbor improvement and 7~L ln to the quality using 
the original algorithm. In this way, if i? S peed > 1 the ran¬ 
dom neighbor improves upon the original and similarly 
Rquai > 1 if the random neighbor is an improvement. 
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Throughout all plots, error bars indicate standard errors 
of the mean. The Louvain algorithm can be applied to 
many different methods, and we here show results for (1) 
modularity using a configuration null model [11] (CM 
modularity); (2) modularity using an Erdos-Renyi null 
model [13] (ER modularity); (3) significance [25]; and 
(4) surprise [29]. 

We first test the impact of the network size as a whole. 
We construct benchmark networks ranging from n = 10 4 
ton = 10' nodes, with equally sized communities of 1000 
nodes, with a Poissonian degree distribution. The speed 
and quality of the original Louvain algorithm and the 
random neighbor Louvain algorithm for all four methods 
is reported in Fig. 2. For all these methods, the random 
neighbor Louvain speeds up the algorithm roughly 2-3 
times. At the same time, the quality of the partitions 
found remains nearly the same. 

However, surprise and significance seem to perform 
worse than modularity. The speedup is rather limited 
for higher /i (or becomes even slower than the original), 
in which case communities are more difficult to detect. 
Surprise and significance tend to find relatively smaller 
communities than modularity [26], suggesting that the 
performance gain of using the random neighbor Louvain 
is especially pertinent when making a relatively coarse 
partition. Revisiting the argument of the ring of cliques 
makes clear that the runtime does not necessarily scale 
with the degree, but rather, with the clique size, which 
we may approximate as the community size. Indeed, 
the runtime for merging all the n c nodes in a single 
community together, should take 0(n j?) originally and 
0(n c logn e ) in the random neighbor Louvain, as previ¬ 
ously argued. However, if there are no clear communities 
present in the network, the running time will not de¬ 
pend on the degree as much, but rather on the sizes of 
the communities found. Hence, the running time should 
then roughly scale as 0(nn c ) for the original implementa¬ 
tion and as 0(n log n c ) for the random neighbor Louvain. 
Since surprise and significance find smaller communities 
than modularity (unless the communities are clearly de¬ 
fined), the speedup will be rather limited, whereas it will 
be larger generally for modularity. 

We test this by generating benchmark networks with 
n = 10 5 nodes, (k) = 10 and varying community sizes 
from 10 to 20 000. Results are displayed in Fig. 3. In¬ 
deed, for larger communities, surprise and significance 
have difficulties discerning such large communities, and 
it tends to find substructure within these large commu¬ 
nities. Notice that modularity also merges smaller com¬ 
munities (thereby uncovering artificially larger communi¬ 
ties), part of the problem of the resolution limit [28]. This 
is exactly also the point at which the speedup for surprise 
and significance goes down. Moreover, when the commu¬ 
nity structure is not clear, there is no effect of community 
size at all. Indeed, in that case, surprise tends to find 
small communities, and modularity tends to find large 
communities. The speedup follows this pattern: surprise 
and significance show very small speedups, while modu¬ 


larity shows larger speedups. 

However, modularity also prefers rather balanced com¬ 
munities [30], so that perhaps modularity performs rather 
well because of the similarity in community sizes. We 
therefore also consider the impact of more heterogeneity 
by constructing LFR benchmark networks [31]. In these 
benchmark graphs the community sizes and the degree 
both follow powerlaw distributions with exponents 1 and 
2 respectively. The maximum degree was set at 2.5 (k), 
while the minimum community size was set at ( k) for 
(k) = 10. We varied the maximum community size from 
2 (k) to 10(fc). These results are displayed in Fig. 3, from 
which we can see that the heterogeneity in community 
sizes does not affect the results. 

We also tested the random neighbor Louvain on six 
empirical networks of varying sizes. These networks were 
retrieved from the Koblenz Network Collection . We in¬ 
clude (1) the adolescent health dataset, a school network 
collected for health research [32]; (2) Brightkite, a social 
network site [33]; (3) a Facebook friendship network [34]; 
(4) an author collaboration network from the High En¬ 
ergy topic on arXiv [35]; (5) a web hyperlink network 
released by Google [ > ]; and (6) the complete web hyper¬ 
link network from the universities of Berkeley and Stan¬ 
ford [36]. An overview of the size of the networks is pro¬ 
vided in Table I, and the results are displayed in Fig. 4. 
The random neighbor Louvain is clearly faster for most 
networks and methods, reaching even speedup ratios of 
over 10 for the hyperlink web network from Berkeley and 
Stanford. For the web network released by Google the 
improvement is not faster however. The quality remains 
relatively similar for most networks, especially for mod¬ 
ularity, whereas the quality differs more for surprise and 
significance. 

Notice that significance is not defined for weighted net¬ 
works, such that significance is not run on those networks 
(health and author collaboration). But weighted net¬ 
works raise an interesting point: is it possible to make 
use of the weight to improve the speed even more? A 
natural possibility is to sample neighbors proportional to 
the weight. Neighbors in the same community are of¬ 
ten connected with a higher weight, part of the famous 
strength of weak ties [37, 38]. Sampling proportional to 
the weight should thus increase the chances of drawing 
a “good” community. However, this depends on the ex¬ 
tent to which this correlation between weight and com¬ 
munity holds. The aggregated graph is weighted also, 
allowing the possibility of weighted sampling as well. On 
the other hand, only little time is spent in the aggregated 
iterations, making the benefit relatively small. Weighted 
sampling in constant time requires preprocessing, which 
takes an additional 0(m) memory and 0(m) time. The 
question is thus whether these costs do not offset the 
possible benefits. 


3 http://konect.uni-koblenz.de/ 
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FIG. 5. Performance weighted neighbor sampling. In¬ 
stead of sampling a neighbor randomly, it is also possible to 
sample neighbors proportional to the weight. We here test the 
performance of the unweighted neighbor sampling in (a)-(b) 
and the weighted neighbor sampling in (c)-(d). We gener¬ 
ate weighted LFR benchmark networks, where the strength 
of the nodes follows the degree s; = kf with /3 = 1.5 with 
( k) = 10 and n = 10 5 . The results for the unweighted neigh¬ 
bor sampling in (a) and (b) are very similar to the results 
for the weighted neighbor sampling in (c) and (d). Taking 
into account the weight hence does not improve the random 
neighbor sampling much. 


We use weighted benchmark networks [39] to test 
whether weighted sampling speeds up the algorithm even 
further. These benchmark networks introduce an addi¬ 
tional mixing parameter for the weight . Whereas the 
topological mixing parameter /i controls the probability 
of an edge outside of the community, the weight is dis¬ 
tributed such that on average a proportion of about 
lies outside of the community. The strength of the nodes 
follows the degree Si = fcf with /? = 1.5 with ( k} = 10 
and n = 10 5 . The external weight fJ, w Si is spread over 
/ifci external links, thereby leading to an average external 
weight of If n w > H the external weight 

is higher than the internal weight, making it difficult to 
detect communities correctly. Intuitively, we would thus 
expect to see an improvement in the random neighbor 
selection whenever /. i w < /r, as in that case, the weight 
correlates with the planted partition. The results for 
both the unweighted and the weighted random neighbor 
sampling is displayed in Fig. 5. Although the weighted 
random neighbor sampling sometimes improves on the 


unweighted variant, overall the performance is compara¬ 
ble. The results on the unweighted benchmark networks 
and the empirical networks are also very comparable (not 
shown). 

V. CONCLUSION 


Many networks seem to contain some community 
structure. Finding such communities is important across 
many different disciplines. One of the most used algo¬ 
rithms to optimize some quality function is the Louvain 
algorithm. We here showed how a remarkably simple 
adjustment leads to a clear improvement in the run¬ 
time complexity. We argue that the approximate run¬ 
time of the original Louvain algorithm should be roughly 
0(m), while the improvement reduces the runtime to 
0(n\og(k)) in a clear community structure. So, whereas 
the original algorithm is linear in the number of edges, 
the random neighbor algorithm is nearly linear in the 
number of nodes. 

We have tested the random neighbor algorithm ex¬ 
tensively. The improvement is quite consistent across 
various settings and sizes. The runtime complexity was 
reduced, speeding up the algorithm roughly 2-3 times, 
especially when concentrating on the coarser partitions 
found by modularity. Nonetheless, some methods, such 
as surprise and significance, are more sensitive to sam¬ 
pling a random neighbor. This seems to be mostly due to 
the community size in the uncovered partition. Whereas 
modularity prefers rather coarse partitions, both signifi¬ 
cance and surprise prefer more refined partitions, leading 
to much smaller communities. More refined partitions of¬ 
fer fewer opportunities for improving the runtime, so that 
sampling a random neighbor provides little improvement. 

The idea could also be applied in different settings. 
For example, the label propagation method is also a very 
fast algorithm [40], but it doesn’t consider any objective 
function. It simply puts a node in the most frequent 
neighboring community. But instead of considering ev¬ 
ery neighbor, it can simply choose a random neighbor, 
similar to the improvement here. We may thus expect 
a similar improvement in label propagation as for the 
Louvain algorithm. Similar improvements may be con¬ 
sidered in other algorithms. The core of the idea is that 
a random neighbor is likely to be in a “good” community, 
which presumably also holds for other algorithms. 
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Appendix A: Complexity in a clique 

We here aim to determine the expected number of 
moves in the random neighbor algorithm. We assume 
it is always beneficial to move a node to a larger commu¬ 
nity. In other words, whenever we select a random node 
i, and a random neighbor j, and the community Oj of 
the random neighbor is larger than the community < 7 ,; of 
node i, i.e. if \V aj \ > \V ai |, we will move the node. 

Let us denote by f)~ the number of communities that 
have size k 

fk = \{c\\V c \=k}\. (Al) 

Then g k = kf k denotes the number of nodes that be¬ 
long to a community of size k. Additionally, define 
F k = fi the number of communities that have size 

k or larger. Similarly, define G k = Yli=k 9i the number of 
nodes in communities that have size k or larger. Clearly 
J2k 9k = n so that G i = n. Also, fk = r denotes 
the number of communities. The probability to select 
a node from a community of size k is then simply 
Let us denote by X c d the event of moving a node from a 
community of size c to a community of size d. Then the 
probability of X c d is 

Xc<d 

Pv(X cd )=lf^ if c=d (A2) 

[ 0 if c > d 

The probability to move it from any community to any 
other is then ]T) cd Pr(X c( 2). Alternatively, it is easy to 
see that the probability to move to any other community 
is Gk ~ k (where we subtract k to make sure it moves to 
another community, and not the same community). So, 
overall, the probability we will move a node is then 

Pr(move) = — —-- (A3) 

fc=l n 

Similarly, the probability we will not move a node to 
another community (i.e. we remain stuck in the same 
partition) is then 

r> / , n 9k n — Gk + k 

Pr(not move) = y, -• (A4) 

fc=l H 

We only reduce the number of communities if we move a 
node from a community of size 1 of course. Hence, the 
probability to reduce the number of communities by 1 is 
then 

fk = r - 1) =- -- (A5) 

z ' n n 

i 

Now it would be possible to construct a complete tran¬ 
sition network from any of the partitions to other par¬ 
titions. However, this becomes quickly intractable, and 
rather difficult to solve. 


Instead, we suggest to group partitions by the number 
of communities. Then, we divide the process into differ¬ 
ent phases. The algorithm would be in phase i whenever 
there are n—i+1 communities. In other words, in the first 
phase there are n communities, and the next phase starts 
whenever one of these nodes is put in another community. 
In the penultimate phase, there are only two communi¬ 
ties. Let us then denote by t, the number of moves during 
phase i, and by t the total number of moves during the 
whole process. We would like to examine E(t), which we 
can write out as )>A E(ti) by linearity of expectation. 

The number of ways to partition a set of n nodes in 
r sets can be denoted by p r (n), which is known as the 
partition function in number theory [41]. This function 
obeys the recursive identity 

p r (n) =pr(n-r)+p r -i(n-l), (A6) 

since there are p r -\{n — 1) partitions with at least one 
community of size 1 and p r (n — r) partitions that all have 
a community of size at least 2 (since we put r nodes in 
each one of the r communities). We would then like to 
know how many partitions there are that have s com¬ 
munities of size 1. Let us first define qk{n ) to denote the 
number of ways to partition n into k sets with at least one 
community of size 1. Secondly, we need its counterpart 
Uk(n) which denotes the number of ways to partition n 
into k sets without any community of size 1. Obviously 
then Pk{n) = q k (n) + Uk(n). We can then derive the 
recursion 

fc-i 

q k (n) ='^2,u k - r {n - r) (A7) 

r—1 

k -1 

= ^2 Pk-r (n - r) - q k -r (n - r) (A8) 

r—1 

The reasoning is as follows. If there are r communities of 
size 1, we should know how many partitions there are of 
n — r nodes into k — r communities without using com¬ 
munities of size 1. Here gi(l) = 1. More specifically, let 
us denote by qk(n,r ) the number of partitions that have 
r communities of size 1 and in total k communities, using 
n nodes. Then, obviously, qk{n) = ^r=;J qk(n,r). More¬ 
over, q k {n,r) = u k - r {n-r) = p k - r (n- r) - q k - r {n~ r). 
The average probability to reduce the number of commu¬ 
nities by 1 is then 


Pv(j2f s =k-l\Y / fs=k) 

(A9) 

qk{n,r) <?i G\ — 1 

Pk(n) n n 

(A10) 

q k (n,r) n-rn — 1 
p k (n) n n 

(All) 


This expression is unfortunately not easy to evaluate ana¬ 
lytically. Moreover, it incorrectly assumes that each par¬ 
tition is equally likely a priori , whereas we know that 
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a more uneven distribution of community sizes is more 
likely due to the preferential attachment to the largest 
community. However, the following upper bound is im¬ 
mediate 


which using the bound in Eq. (A12) leads to 

n n 


E(ti ) > 


n — k + In — 1 


Pr 


(£/» = *-! I £/. = *) < 

n — fc + ln—1 


so that 


n n 


(A12) 


^ <n — i 


n n 


“ n-i + ln-l 


i=l 


In general E(ti) can be calculated relatively straight¬ 
forward as 


E(U) = 


1 


t—* % 

t=i 

0 (nlog n) 


Pr(reduce community by 1) 


(A13) 


(A14) 


(A15) 

(A16) 

(A17) 






